Application of smart chemometric models for spectra resolution and determination of challenging multi-action quaternary mixture: statistical comparison with greenness assessment

A multivariate spectrophotometric method is a potential approach that enables discrimination of spectra of components in complex matrices (e.g., pharmaceutical formulation) serving as a substitution method for chromatography. Four green smart multivariate spectrophotometric models were proposed and validated, including principal component regression (PCR), partial least-squares (PLS), multivariate curve resolution-alternating least squares (MCR-ALS), and artificial neural networks (ANN). The developed chemometric models were compared to resolve highly overlapping spectra of Paracetamol (PARA), Chlorpheniramine maleate (CPM), Caffeine (CAF), and Ascorbic acid (ASC). The four multivariate calibration models were assessed via recoveries percent, and root mean square error of prediction. Hence, the proposed models were efficiently applied with no need for any preliminary separation step. The models were utilized to analyze the studied components in their combined pharmaceutical formulation (Grippostad® C capsules). Analytical GREEnness Metric Approach (AGREE) and eco-scale tools were applied to assess the greenness of the established models and found to be 0.77 and 85, respectively. Moreover, the proposed models have been compared to official ones showing no considerable variations in accuracy and precision. Therefore, these models can be highly advantageous for conducting standard pharmaceutical analysis of the substances researched within product testing laboratories. Supplementary Information The online version contains supplementary material available at 10.1186/s13065-024-01148-9.


Introduction
Quality control analysis within the pharmaceutical industry requires determining several parameters for both raw materials and end products.The most utilized analytical technique for quality control analysis of pharmaceutical products is high-pressure liquid chromatography (HPLC) [1].However, an HPLC technique is costly, requires significant labor, and consumes time, whilst also generating hazardous waste materials.This gets an interest in developing simple, green, and valid alternative techniques that produce precise and accurate outcomes with efficiency and minimal human intervention [2].Chemometrics is a well-known chemical discipline that uses mathematics, statistics, and formal logic for extracting meaningful and important qualitative or quantitative information from given chemical data [3].Recently, chemometric models like principal component regression (PCR), partial least squares (PLS), multivariate curve resolution alternating least squares (MCR-ALS), and artificial neural networks (ANN) have generated considerable interest in the detection of multi-component preparations [4][5][6][7][8].PCR and PLS are the most applied multivariate calibration approaches in chemometrics.These models enable the resolution of overlapped spectra, reduction of interference between signals, and minimization of noise [9].Newer advanced models were implemented for multivariate calibration in recent years, one of which is MCR-ALS.ALS was then applied, and several constraints were tried to limit the possible solutions and improve the quantification of compounds' concentration profiles [10,11].ANNs represent a sophisticated model capable of emulating several cognitive processes of the human brain through diverse algorithms.It is deemed superior to other conventional multivariate models to model variables' linear and nonlinear relationships [11,12].Green chemistry has become the main driving force in both laboratory and industry settings in promoting sustainability.The twelve principles of Green Analytical Chemistry (GAC) developed by Anatas provide guidance for those interested in pursuing this approach [13].Chemists from various fields, including organic, analytical chemistry, and chemical engineering provide a framework for implementing measures to increase the eco-friendliness of chemical materials and processes [14].The bulk of endeavors to create more environmentally friendly chemical processes concentrate on employing cleaner, less hazardous, gentler solvents, or removing solvents altogether, also minimizing chemical reagents.Other efforts involve preserving energy through the use of underivatized samples and employing raw materials derived from renewable resources [15].Paracetamol (PARA), N-acetylp-aminophenol, or acetaminophen is commonly used as an antipyretic and analgesic drug.It is a painkiller that can alleviate symptoms of cold including headache, earache, and joint pain.Additionally, it is effective in lowering a high body temperature [16,17] (Fig. 1).Chlorpheniramine maleate (CPM), (3RS propane-1-amine hydrogen (Z)-butenedioate, is an antihistaminic drug that is used to treat runny nose, sneezing, and watery eyes caused by the common cold, or the flu [16,17] (Fig. 1).Caffeine (CAF), 1,3,7-Trimethyl-3,7-dihydro-1 H-purine-2,6-dione, is a CNS stimulant that improves alertness and alleviates the malaise that is often associated with the common cold [16,17] (Fig. 1).Ascorbic acid (ASC) is an antioxidant substance that supports the immunity of the body and helps fight against cold [16].It possesses a positive impact and plays a protective role in curing new coronavirus disease [18].It is chemically designated as (5R)-5-[(1S)-1,2-Dihydroxyethyl]-3,4-dihydroxyfuran-2(5H)-one [17] (Fig. 1).A combination of PARA with CPM, CAF, and ASC is used to treat common symptoms like headaches, limb pain, rhinitis, and dry cough that occur with the common cold [19].It is also an effective treatment for the same symptoms that are associated with COVID-19 [20].Although different chemometric assays were reported for the determination of the investigated drugs either in different binary [21][22][23]or in a quaternary mixture [24] a comprehensive literature review uncovered no evaluated chemometric models for resolving the spectra of the four drugs in their dosage form.In addition, these reported techniques did not consider the greenness assessment.The present eco-friendly work seeks to reduce the use of potentially dangerous materials that harm the environment.Hither, AGREE and penalty points scoring systems were employed to evaluate the greenness of our developed models.AGREE provides a thorough environmental assessment of the whole analysis procedure [25].The eco-scale assessment of the established approaches was calculated and deducted from Fig. 1 Chemical structure of the studied drugs 100 based on penalty points [26].This research aimed to apply green smart multivariate models to concurrently quantify PARA, CPM, CAF, and ASC in their dosage form and to quantitatively assess the efficiency of the established models and compare their performance by different statistical tests.The study was based on UV-Vis spectrophotometry as an analytical technique in combination with a non-linear model (ANN/RBF) and multivariate curve resolution.

Chemicals and materials
PARA, CPM, CAF, and ASC powders were kindly supplied by the Egyptian Drug Authority, (EDA), Egypt.Their purity was tested by the official British Pharmacopeial method [17] for PARA and ASC and found to be 100.04± 1.26 and 100.04 ± 1.36, respectively, while by the USP official method [27]for CPM and CAF and found to be 100.00± 1.19 and 99.69 ± 1.73.Grippostad® C capsules (batch no.G52165) were manufactured by STADA, Germany claimed to contain 200.00 mg PARA, 2.50 mg CPM, 25.00 mg CAF, and 150.00 mg ASC per capsule.Methanol was purchased from Sigma-Aldrich, Germany.

Standard solution
Stock standard solutions of PARA, CPM, CAF, and ASC (1.00 mg/mL, each) were prepared by weighing 100.00 mg of each drug into four separate 100 mL-volumetric flasks, then methanol was added.The solutions were sonicated until dissolution and then completed to the mark with methanol.Working standard solutions with a concentration of 100.00 µg mL − 1 from PARA, CPM, CAF, and ASC were prepared from their corresponding stock standard solutions.

Spectral characteristics and absorption spectra
The absorption spectra of PARA, CPM, CAF, and ASC were measured over the range of 200.0-400.0nm.The spectrum data points ranging from 220.0 to 300.0 nm, were selected and transferred for further data analysis on MATLAB®.

Construction of calibration and validation sets
A five-level, four-factor calibration design [28] was employed to construct the calibration and validation sets.Twenty-five mixtures containing various concentrations of PARA, CPM, CAF, and ASC in the ranges between 4.00 and 20.00, 1.00-9.00,2.50-7.50, and 3.00-15.00µg mL − 1 , respectively were applied to design the calibration set.In 10 mL volumetric flasks, different aliquots of their working solutions were combined and diluted with methanol to the appropriate level.The spectra of the resulting solutions were measured within the wavelength range of 200.0-400.0nm.Using 1 nm intervals, the spectral data in the 220.0-300.0nm range were imported into MAT-LAB for data manipulation and calibration model building.In the calculations, 81 experimental points were utilized.The spectral data were mean-centered prior to the ANN, PLS, PCR, and MCR-ALS model construction.For both the PCR and PLS models, latent variable (LV) numbers were optimized using leave-one-out cross-validation.Four LVs that corresponded to the least significant error of calibration were optimum in both models.In the MCR-ALS model, non-negativity constraints were chosen which oblige concentration to be zero or more than zero.In this study, we established a feed-forward model based on Levenberg-Marquardt backpropagation as an ANN training algorithm [8].To achieve optimal network architecture, various elements require refinement: the number of nodes in the hidden layer, the learning rate, and the number of epochs.Four hidden neurons were found to be optimum when using a purelin-purelin transfer function.Further parameters like a learning rate of 0.1 and 100 epochs were optimized as well.The calibration models of PCR, PLS, MCR-ALS, and ANN were constructed, and their predictive power was evaluated using a validation set consisting of five samples.

Assay of pharmaceutical dosage form
The contents of ten Grippostab®capsules were accurately emptied and weighed.The equivalent weight to one capsule was added into a 100-mL volumetric flask, 25 mL methanol was added then the solution was ultrasonicated for 30 min then filtered into a 100-mL measuring flask and the volume made up to the mark with methanol.0.50 mL from the previous solution was transferred into a 50-mL volumetric flask and then completed volume with methanol after spiking with 0.75 µg mL − 1 of standard working solution of CPM.The preceding solution was claimed to have a final concentration of 20.00 µg mL − 1 PARA, 1.00 µg mL − 1 CPM, 2.50 µg mL − 1 CAF, and 15.00 µg mL − 1 ASC.
The aforementioned procedures were utilized to examine pharmaceutical preparation via the proposed chemometric models.Subsequently, the concentrations of PARA, CPM, CAF, and ASC were determined.

Results and discussion
The use of multivariate analysis as a tool to resolve severely overlapping spectroscopic data simultaneously including numerous spectroscopic variables at different wavelengths instead of using univariate analysis that relies on a single value corresponding to a maximum absorbance at selected wavelength leads to an increase in specificity and sensitivity [29].Moreover, multivariate calibrations are efficiently used in biodiesel, plant extract, and pharmaceutical formulation analysis [30][31][32][33].Herin, the quantification of our combined drugs in Grippostad C® capsules using the univariate spectrophotometric method was hindered by the severe spectral overlap (Fig. 2).Hence, chemometric models (PCR, PLS, MCR-ALS, and ANN) were employed to quantify them successively.

Calibration and validation set
The suggested models were optimized and calibrated using twenty-five mixtures, Table 1.The samples' absorbance data was scanned between 220.0 and 300.0 nm.This range was chosen since all components have suggested absorbance characteristics within this range.To remove noise influence within the calibration matrix., wavelengths below 220.0 nm were excluded.Wavelengths over 300.0 nm were also excluded because they were regarded as less informative absorbance data.

PCR and PLS models
PCR and PLS have attracted significant attention in chemometrics for multicomponent analysis.Determining which method is superior remains a challenging problem [34][35][36].These models are particularly effective when there is only limited information available regarding the components.PCR generates components to increase the interpretability of data, without taking the response variable into consideration.Conversely, PLS involves the response variable in its analysis and frequently produces models that require fewer components to fit the response variable [36].Hence, PLS produces a more resilient model by eliminating interference from absorbance and concentration data.To establish the optimal number of variables, a cross-validation approach of leaving out one sample at a time was applied (Fig. 3).In our investigation, four LVs proved to be optimal in both PCR and PLS.

MCR-ALS model
MCR constructs a regression model by evaluating the relationship of a variable with other variables [30,37].This model extends the Beer-Lambert law into the multiwavelength domain [38].MCR separates the spectral data background into concentration profile and pure spectral-profile matrices.Then the matrix of residuals was calculated.MCR-ALS iteratively estimates the concentrations of proposed components from spectral profiles.Numerous constraints such as unimodality, closure, equality, and non-negativity were applied during ALS optimization to model concentration and pure spectra profile shape.Another advanced constraint was applied in MCR-ALS framework to obtain pure resolved profiles in arbitrary units without reference quantitative information [39].This issue has been resolved by incorporating an inner calibration into the MCR-ALS model through correlation constraint [39].During ALS optimization, the order of components could be permuted without changing the data matrix because of rotational ambiguity [37].In this study, non-negativity constraints with correlation constrain were applied to both the concentration and spectral profiles.Convergence was commonly reached when equal to a value of (0.1%) [40].The convergence was achieved after five iterations.The figures of merit of optimization procedures (% lack of fit and variance percentages (R 2 )) were calculated and found to be 1.5625 and 99.9754 which were reasonable and assisted in improving the strength of the developed MCR-ALS model (Fig. 4).
The MCR-ALS model can estimate the spectral profiles of drugs, providing a qualitative meaning in its algorithms.The estimated spectra were closely similar to the Fig. 2 Absorption spectra of 12.00 µg mL − 1 PARA (-), 1.00 µg mL − 1 CPM (….), 2.50 µg mL − 1 CAF(…), and 6.00 µg mL − 1 ASC (--), using methanol as solvent original ones (Fig. 5).To calculate the concentration of the studied components in the validation set, the related  spectral and concentration profiles were recovered during the ALS optimization using a one-by-one test sample as recommended in multivariate calibration models [41].
The predicted concentrations are presented in (Table S1) with a good value of RMSEP.The MCR-ALS model can estimate the spectral profiles of drugs, providing a qualitative meaning in its algorithms.The estimated spectra were closely similar to the original ones (Fig. 5).Therefore, the MCR-ALS model offers the added benefit of qualitative component detection, in addition to its quantitative determination ability.

ANN model
It is a rivaled intelligence technique comprised of a significant number of simple, meticulously connected nodes or synthetic neurons that mimic the authentic nervous system function to identify correlations between inputs and outputs.ANN is a more effective option for modelling both linear and non-linear relationships between variables than other established multivariate approaches, such as PCR and PLS [11,42].The ANN type that was trained in this research is the feed-forward model.81 neurons were used as an input layer, corresponding to the number of spectral data points used, and for the output layer, four neurons were used which corresponded to the number of compounds to be established in each sample.
The optimum number of neurons in the hidden layer was five using a purelin-purelin transfer function and 100 epochs.A fully trained ANN's mean squared error (MSE) performance over epochs is shown in Fig. 6.The MSE of training was decreased steadily after epoch = 0.Both the test and validation plots exhibited a similar pattern without abrupt variation.Figure 7 also shows the predictions for training, test, and validating sets diagrams of the chosen layers and neurons.
To evaluate the predictive ability of the chemometric models PCR, PLS, MCR-ALS, and ANN models, spectra from the validation set were utilized.The average recoveries and RSDs were computed for each component (Table 2), indicating favorable outcomes.Table 3 offers the regression and validation parameters for the validation sets to quantify pure samples of PARA, CPM, CAF, and ASC.To judge the performance of the proposed multivariate calibration models, three statistical tests were calculated.The first statistical test was Durbin-Watson statistical test which predicted the correlation among prediction residuals [43][44][45].The second statistical test was the root mean square error of prediction (RMSEP) [46].RMSEP is mostly recognized as a measuring tool for the evaluation of prediction quality [46].The estimation of RMSEP plays a key role in the validation of multivariate calibration models as it indicates both the accuracy and precision of the model [46].The third statistical test was the elliptical joint confidence region (EJCR) test [47,48].EJCR test was conducted to compare the performance of ANN and MCR-ALS.Durbin-Watson indicator showed a very low associated probability in each of the four individual analytes in Linear models (PCR, PLS) (Table 3), indicating that non-linearity was significant.Moreover, the non-linear ANN model had the least RMSEP and RMSEC (Table 3).The performance of ANN when compared with MCR-ALS was statistically demonstrated using an EJCR test, showing that there was no statistical difference between the two models Figure S1.All results confirmed the idea that ANN is the model of choice for the quantitative determination of the studied drug mixture.Additionally, only the MCR-ALS model can separate the pure spectral profiles of the four components.As a result, it was suitable for both qualitative and quantitative analysis.

Assay of pharmaceutical dosage form
The developed chemometric models were used for the assessment of PARA, CPM, CAF, and ASC in their dosage form.Grippostad C® Capsules contain PARA, CPM, CAF, and ASC with a challenging ratio (80:1:10:60) which permits the determination of PARA, CAF, and ASC without any interference with CPM.While CPM was quantified after spiking with 0.75 µg mL − 1 of standard working solution of CPM.The good recovery % data results with a standard deviation of less than 2 (Table 4), confirm the precise quantification of the combined four drugs in their pharmaceutical product.

Greenness assessment
To qualify analytical techniques as environmentally sustainable, it is essential to refine analytical procedures by eliminating or reducing hazardous reagents, preserving energy, and enhancing analyst safety.Refining is required throughout the process of analyzing, from gathering samples to analytical waste management [14,49].Therefore, it is vital to appraise the environmental impact and possible repercussions on the workforce when measuring the eco-friendliness of analytical techniques.Various assessment methods have been conceived to gauge the greenness of analytical procedures [15].Two metrics were used to assess the greenness of the suggested technique, the Analytical Eco-Scale, and Analytical Greenness Metric, which are significant in assessing eco-friendliness.Analytical Eco-scale score was determined by subtracting the total number of penalty points for the whole procedure from a base of 100.An excellent green analysis will show a score higher than 75 [26].The developed technique showed a high score on the Eco-scale (85) proving that it is an excellent green method of analysis (Table 5).However, Analytical Eco-scale did not supply comprehensive information about the assessed parameters.To provide more information, the most recent greenness assessment tool, AGREE was implemented [25].The pictogram of the proposed models scored 0.77 indicating that the method is green, Fig. 8.This came in agreement with the previously reported literature that the closer the score to one, the better the method [25].The proposed models showed an overall excellent greenness profile.Furthermore, The environmental impact assessment of the proposed Fig. 6 Best validation performance for the prediction of the ANN model models was compared with the reported literature [21][22][23][24]as shown in Table 5and Fig. 8.

Statistical Analysis
The statistical analysis of the chemometric approaches developed in this study and the official methods [17,27], presented in Table 6, demonstrated that there was no significant disparity between the two in terms of accuracy and precision.

Conclusion
The continuous development in chemometrics enables the separation and analysis of chemical data beyond univariate analysis.Chemometrics is a potentially successful substitute for expensive chromatographic techniques.The proposed chemometric models have been proficient in swiftly, simply, and consistently measuring PARA, CPM, CAF, and ASC simultaneously with excellent sensitivity and reliability.The greenness of the developed models was considered during their early development stages.Subsequently, they underwent evaluation through the AGREE assessment and penalty point scoring system.Statistically, no significant differences were found between the established and official ones in terms of accuracy, and precision.Thus, the proposed green multivariate models serve as a practical and environmentally conscious option for the standard analysis of PARA, CPM, CAF, and ASC in bulk powder and pharmaceutical formulations.  a] British pharmacopeial method: Titrimetric method with 1 M cerium sulfate until a greenish-yellow color is obtained.
[b] USP pharmacopeial method: Non-aqueous titration using standard 0.1 M perchloric acid using Crystal violet as an indicator.
[c] USP pharmacopeial method: HPLC method using C 8 column with a mobile phase composed of sodium acetate solution: acetonitrile: tetrahydrofuran (191:5:4, by volume) at a flow rate of 1.0 mL/min and UV detection wavelength at 275.0 nm.

Fig. 4 Fig. 3
Fig. 4 (a) Percentage lack of fit and (b) variance percentage of MCR-ALS model

Fig. 7
Fig. 7 Prediction for the training, test, and validation diagrams of the ANN model

Table 2
Prediction of validation set samples using the proposed chemometric models

Table 3
Performance parameters of the calibration and validation sets calculated for each proposed model

Table 4
Quantitative determination of PARA, ASC, CAF, and CPM in the dosage form by the proposed chemometric models *Average of three determination

Table 5
A comparison between the developed chemometric models and the reported methods using analytical Eco-scale

Table 6
Statistical comparison of the results obtained by the proposed chemometric models and the official methods for the determination of PARA, ASC, CAF, and CPM in their pure powdered form *[a][b][c]